Towards an Indonesian-English SMT System: A Case Study of an Under-Studied and Under-Resourced Language, Indonesian

نویسنده

  • S. D. Larasati
چکیده

This paper describes a work on preparing an Indonesian-English Statistical Machine Translation (SMT) System. It includes the creation of Indonesian morphological analyzer, MorphInd, and the composing of an Indonesian-English parallel corpus, IDENTIC. We build an SMT system using the state-of-the-art phrase-based SMT system, MOSES. We show several scenarios where the morphological tool is used to incorporate morphological information in the SMT system trained with the composed parallel corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Naturalization in Translation:A Case Study on the Translation of English-Indonesian Medical Terms

Naturalization is a translation procedure that is predominantly utilized in the translation of English medical terms into Indonesian. This study focuses on identifying types of naturalization involving the adjustment of spelling and pronunciation and investigating whether naturalization has been appropriately applied based on the rules in the Indonesian general guidance of term formation. The d...

متن کامل

Naturalization in Translation:A Case Study on the Translation of English-Indonesian Medical Terms

Naturalization is a translation procedure that is predominantly utilized in the translation of English medical terms into Indonesian. This study focuses on identifying types of naturalization involving the adjustment of spelling and pronunciation and investigating whether naturalization has been appropriately applied based on the rules in the Indonesian general guidance of term formation. The d...

متن کامل

Unsupervised Word Class Induction for Under-resourced Languages: A Case Study on Indonesian

In this study we investigate how we can learn both: (a) syntactic classes that capture the range of predicate argument structures (PASs) of a word and the syntactic alternations it participates in, but ignore large semantic differences in the component words; and (b) syntactico-semantic classes that capture PAS and alternation properties, but are also semantically coherent (a la Levin classes)....

متن کامل

Handling Indonesian Clitics: A Dataset Comparison for an Indonesian-English Statistical Machine Translation System

In this paper, we study the effect of incorporating morphological information on an Indonesian (id) to English (en) Statistical Machine Translation (SMT) system as part of a preprocessing module. The linguistic phenomenon that is being addressed here is Indonesian cliticized words. The approach is to transform the text by separating the correct clitics from a cliticized word to simplify the wor...

متن کامل

IIT Bombay's English-Indonesian submission at WAT: Integrating Neural Language Models with SMT

This paper describes the IIT Bombay’s submission as a part of the shared task in WAT 2016 for English–Indonesian language pair. The results reported here are for both the direction of the language pair. Among the various approaches experimented, Operation Sequence Model (OSM) and Neural Language Model have been submitted for WAT. The OSM approach integrates translation and reordering process re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012